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1.  Introduction 


A  random  number  generator  (RNG)  is  an  algorithm  with  output  that  is  in  some  sense 
statistically  indistinguishable  from  a  random  sample.  This  is  an  essential  component 
of  any  stochastic  simulation  (such  as  MUVES),  which  relies  on  the  availability  of 
independent  random  quantities  for  statistical  validity.  Parallel  (distributed)  process¬ 
ing  adds  another  layer  to  this  requirement.  The  sets  of  random  quantities  in  distinct 
threads  must  also  be  independent  of  each  other.  Otherwise,  no  sets  of  quantities 
within  or  among  threads  can  be  claimed  to  be  a  random  sample. 

The  evolution  of  random  number  generation  in  MUVES  over  the  past  25  years, 
from  1990  to  2015,  proceeds  from  a  short-period  low-resolution  single-threaded 
implementation  with  questionable  numerical  and  statistical  properties.  This  legacy 
system  is  documented  in  Section  2. 

The  modem  concept  presented  in  Sections  3  and  4  overcomes  all  the  shortcom¬ 
ings  of  the  legacy  system  and  provides  independent  sets  of  random  numbers  for 
thread- safe  parallel  processing.  Details  of  the  MUVES  implementation  develop¬ 
ment  through  software  change  requests  are  seen  in  Sections  5,  6,  7,  and  8.  Working 
algorithm  code  and  a  procedure  for  verification  of  the  statistical  independence  prop¬ 
erties  are  in  Section  9. 

This  report  traces  the  history  of  random  number  generation  in  MUVES,  includ¬ 
ing  the  mathematical  basis  and  statistical  justification  for  algorithms  used  in  the 
code.  The  working  code  provided  produces  results  identical  to  the  current  imple¬ 
mentation.  These  theoretical  and  practical  details  enable  the  reader  to  understand 
the  algorithms  and  ensure  that  future  enhancements  to  the  production  code  preserve 
the  integrity  of  the  system. 

2.  Legacy 

Parallel  processing  was  not  a  design  consideration  when  MUVES  development  be¬ 
gan.  So  the  legacy  RNG  was  inherently  single-threaded.  The  main  concern  was 
portability,  but  the  legacy  RNG  is  deficient  in  several  other  respects.  Nonetheless, 
it  is  instructive  to  consider  its  construction  and  operation.  Sections  2. 1-2.5  describe 
the  mathematics  behind  the  legacy  RNG  implementation,  and  point  out  undesirable 
features  and  areas  that  need  improvement. 
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2.1  Integers 


Integer  addition,  subtraction,  and  multiplication  are  intrinsic,  but  division  is  char¬ 
acterized  by  the  division  algorithm:  one  can  divide  any  a  by  any  nonzero  m  to  get  a 
unique  quotient  q  and  the  remainder  r.  To  be  specific, 

ia  Vm  >  0  □  \q  3!r  :  a  =  qm  +  r  ,  0  <  r  <  m  .  (1) 

(The  quotient  and  remainder  are  made  unique  by  the  bounds  on  r.)  The  remainder 
or  “mod”  operator  expresses  this  relationship,  and  we  say  that  “a  mod  m  equals 
r”,  denoted  a%m  =  r.  When  the  remainder  is  0,  we  say  that  “m  divides  a”,  denoted 
m  |  a. 

Some  useful  properties  of  the  remainder  are 

(< a  ±  b)%m  =  ( a%m  ±  b%m)%m  (2) 

which  follows  from  a  ±  b  =  (qa  ±  qi,)m  +  (ra  +  r/, ), 

(ab)%m  =  ( a%m  ■  b%m)%m  (3) 


which  follows  from  ab  =  ( qam  +  ra)(qi,m  +  r/,j  =  qm  +  rar/},  and 

a  a%(bm) 

b  |  a  =>  —%m  =  - - - 

b  b 


(4) 


which  follows  from  a/b  =  qm  +  r,  a  =  q(bm)  +  br,  and  a%(bm)  =  hr . 

2.2  Linear  Congruential  Generator 

The  Linear  Congruential  Generator  (LCG)  is  a  basic  RNG  is  defined  by  a  linear 
recurrence  modulo  m 

xi+ 1  =  T(xi)  =  (axi  +  c)  %  m  (5) 

with  integer  and  real  output  functions  z.(x,)  and  u(z.i).  The  RNG  state  x  is  not  nec¬ 
essarily  the  same  thing  as  the  integer  output  z(x). 
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An  LCG  has  full  period  m,  obtaining  all  values  in  0, . . .  ,m  -  1,  if  and  only  if 


m  and  c  are  relatively  prime 
V  prime  q  :  q  \  m  =>  q  \  (a  -  1) 


4  |  m  =>  4  |  (a  -  1) . 


(6) 


The  MUVES  legacy  RNG  is  a  full-period  LCG  with 


m  =  232 

a-  1  =  1103515244  =  22  •  132  •  613  •  2663 


c  =  12345  =  3  •  5  •  823  . 


(7) 


The  integer  output  function  is 


z  =  (*/65536)%32678  =  (x  »  16)  &  0x7fff , 


(8) 


giving  a  15-bit  integer,  0  <  z  <  215  -  1  =  32767 .  “Right  shift”  is  and  “bitwise 


and”  is  “&.' 


The  real  output  function  is 


u  =  z/32768  =  z/ 215, 


(9) 


approximating  U(0, 1)  with  4  fully  significant  digits  and  resolution  s  «  3  x  10  5. 

The  integer  output  z  is  the  same  as  rand() ,  the  C  library  standard  RNG  in  Kernighan 
and  Ritchie The  “rand”  man  page  offers  a  warning  about  this  function. 

The  versions  of  rand()  and  srand()  in  the  Linux  C  Library  use  the  same  random 
number  generator  as  random(3)  and  srandom(3),  so  the  lower-order  bits  should  be 
as  random  as  the  higher-order  bits.  However,  on  older  rand()  implementations,  and 
on  current  implementations  on  different  systems,  the  lower-order  bits  are  much  less 
random  than  the  higher-order  bits.  Do  not  use  this  function  in  applications  intended 
to  be  portable  when  good  randomness  is  needed.  (Use  random(3)  instead.) 

See  Collins2-  for  randomness  test  results,  including  the  failure  of  this  function. 
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2.3  Independent  RNG  Streams 


Any  RNG  cycle  can  be  used  to  create  statistically  independent  RNGs  by  2  funda¬ 
mental  methods.  The  partition  method  is  to  divide  the  cycle  into  non-overlapping 
streams  of  consecutive  elements,  considered  to  be  mutually  independent.  This  is 
accomplished  by  advancing  (jumping)  the  RNG  in  fixed  increments  (jumps)  to  ob¬ 
tain  the  stream  starting  points.  To  get  n  streams  from  a  RNG  with  period  m,  use  the 
jump  m/n.  The  other  method  is  leapfrog,  where  the  n  stream  starting  points  are  con¬ 
secutive  states  5i,. . .  ,sn  from  the  RNG  cycle.  Then  the  RNG  is  used  with  a  jump 
of  n  in  each  stream  to  obtain  non-intersecting  interleaved  streams,  considered  to  be 
mutually  independent.  In  either  case  the  effective  useful  stream  length  or  period  is 
m/n,  which  is  exhausted  when  one  stream  collides  with  another. 

MUVES  uses  the  partition  method  and  8  streams,  4  denoted  “Shot  Pattern”,  “Clus¬ 
ter”,  “Shot  Assessment”,  “BAD”,  and  4  unused.  First,  seed  the  RNG  with  a  32-bit 
integer  .s  i ,  this  is  stream  1.  Then  the  other  7  streams  are  obtained  by  jumping  ahead 
k  =  229  =  232/8  =  536870912  in  the  cycle,  starting  at  57  =  Tk{si-\)  for  /  =  2, . . .  ,8. 
To  use  the  streams,  apply  the  single-step  (jump  1)  transition  T(x)  in  each  stream. 

To  implement  the  leapfrog  method,  seed  the  RNG  with  a  32-bit  integer  5i  for  stream 
1.  Then  the  single-step  (jump  1)  transition  starts  the  other  7  streams  at  5/  =  T(sj-\) 
for  i  =  2, . . .  ,8.  To  use  the  streams,  apply  the  jump  8  transition  T*(x)  in  each 
stream.  MUVES  does  not  use  leapfrog,  but  the  concept  appears  later  in  another 
context. 

2.4  Jumping  Ahead  for  LCGs 

Either  method  is  useful  (and  easy  to  implement)  for  an  LCG,  because  repeated 
application  of  the  LCG  transition  T (x)  =  ( ax  +  c)%m  of  Eq.  5  is  another  LCG 

Tk(x)  =  ( cikX  +  Ck )  %  m  .  (10) 

To  compute  the  coefficients  of  the  jump  LCG,  note  that  modulo  m, 

x\  =  T(x  o)  =  ax  o  +  c 

9  9 

xi  -  T~(x o)  =  ax i  +  c  =  a{axQ  +  c)  +  c  =  a  xo  +  ac  +  c 

009  99 

X3  —T  (xq)  =  a  x 0  +  a~c  +  ac  +  c  -  a  x 0  +  (a  +  a  +  l)c  (11) 
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so  in  general  (mod  m ), 


k- 1 

Xk  -  Tk(x o)  =  cikXo  +  ck  =  ak xq  +  ^  a' 

i— 0 


k  ak  -  1 
C  =  a  Xq  H - •  C  . 

a  -  1 


(12) 


Apply  the  properties  of  Section  2d_  to  see  that 

(ak  -  1  )%[(a  -  1)  w] 


=  (a  )%m  and  c*  = 


a  -  1 


(13) 


High  powers  of  any  w  can  be  computed  with  “exponentiation  by  squaring.”  One  can 
obtain  w2''  by  starting  with  w  and  recursively  squaring  p  times,  each  doubling  the 
exponent.  The  sequence  so  obtained  is  w,  w2,  w4,ws, . . .  ,w2P  since  w  =  wl  =  w2° 
and 

(w2")1  =  w22"  =  w2" .  (14) 

In  other  words,  square  w  recursively  p  times  to  get  wrP . 


The  following  C  code  computes  LCG  coefficients  for  Tk  =  akx  +  ck  where  k  =  2P 
and  p  -  0, ...  ,32  based  on  T(x)  =  ax  +  c  with  MUVES  LCG  parameters. 


#include  <stdio.h> 

#include  <stdint.h> 
int  main()  { 

long  unsigned  a=1103515245 ,  c=12345,  m=lL«32,  ak[33]  ,  ck  [33],  p; 
_ uintl28_t  b; 

for(b=ak[0]=a,  ck[0]=c,  p=l;  p<=32;  p++)  { 
ak[p]  =  (  ak[p-l]  *  ak[p-l]  )  %  m; 
b  =  (  b  *  b  )  %  ((a-l)*m); 

ck[p]  =  (  (b-1)  %  C(a-l)*m)  /  (a-1)  *  c  )  %  m; 


} 


for(p=0;  p<=32;  p++) 

printf("k  =  2A%-2d  =  %101u  :  ak  =  %10u  ck  =  %10u\n" , 
p,  lUL«p ,  ak[p],  ck  [p] )  ; 


Note  ( a2'’  ')2  =  a2P  in  the  computation  of  ( a2P)%m ,  and  b  =  ( a2P)%[(a  -  1  )m\. 
The  latter  needs  128-bit  integers. 


The  output  is 

k  =  2A0  =  1  :  ak  =  1103515245  ck  =  12345 
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k 

- 

2A 1 

- 

2 

ak 

- 

3265436265 

ck 

= 

3554416254 

k 

= 

2A2 

- 

4 

ak 

= 

3993403153 

ck 

= 

3596950572 

k 

= 

2A3 

= 

8 

ak 

= 

3487424289 

ck 

= 

3441282840 

k 

= 

2A4 

= 

16 

ak 

= 

1601471041 

ck 

= 

1695770928 

k 

= 

2A  5 

= 

32 

ak 

= 

2335052929 

ck 

= 

1680572000 

k 

= 

2A6 

= 

64 

ak 

= 

1979738369 

ck 

= 

422948032 

k 

= 

2A7 

= 

128 

ak 

= 

387043841 

ck 

= 

3058047360 

k 

= 

2A8 

= 

256 

ak 

= 

3194463233 

ck 

= 

519516928 

k 

= 

2A9 

= 

512 

ak 

= 

3722397697 

ck 

= 

530212352 

k 

= 

2A 10 

= 

1024 

ak 

= 

1073647617 

ck 

= 

2246364160 

k 

= 

2  A 11 

= 

2048 

ak 

= 

2432507905 

ck 

= 

646551552 

k 

= 

2  A 12 

= 

4096 

ak 

= 

1710899201 

ck 

= 

3088265216 

k 

= 

2  A 13 

= 

8192 

ak 

= 

3690233857 

ck 

= 

472276992 

k 

= 

2A 14 

= 

16384 

ak 

= 

4159242241 

ck 

= 

3897344000 

k 

= 

2  A 15 

= 

32768 

ak 

= 

4023517185 

ck 

= 

2425978880 

k 

= 

2A 16 

= 

65536 

ak 

= 

3752067073 

ck 

= 

556990464 

k 

= 

2A 17 

= 

131072 

ak 

= 

3209166849 

ck 

= 

1113980928 

k 

= 

2A 18 

= 

262144 

ak 

= 

2123366401 

ck 

= 

2227961856 

k 

= 

2A 19 

= 

524288 

ak 

= 

4246732801 

ck 

= 

160956416 

k 

= 

2A20 

= 

1048576 

ak 

= 

4198498305 

ck 

= 

321912832 

k 

= 

2  A  2 1 

= 

2097152 

ak 

= 

4102029313 

ck 

= 

643825664 

k 

= 

2  A  2  2 

= 

4194304 

ak 

= 

3909091329 

ck 

= 

1287651328 

k 

= 

2  A  2  3 

= 

8388608 

ak 

= 

3523215361 

ck 

= 

2575302656 

k 

= 

2A24 

= 

16777216 

ak 

= 

2751463425 

ck 

= 

855638016 

k 

= 

2  A  2  5 

= 

33554432 

ak 

= 

1207959553 

ck 

= 

1711276032 

k 

= 

2A26 

= 

67108864 

ak 

= 

2415919105 

ck 

= 

3422552064 

k 

= 

2A27 

= 

134217728 

ak 

= 

536870913 

ck 

= 

2550136832 

k 

= 

2  A  2  8 

= 

268435456 

ak 

= 

1073741825 

ck 

= 

805306368 

k 

= 

2A29 

= 

536870912 

ak 

= 

2147483649 

ck 

= 

1610612736 

k 

= 

2A30 

= 

1073741824 

ak 

= 

1 

ck 

= 

3221225472 

k 

= 

2  A  3 1 

= 

2147483648 

ak 

= 

1 

ck 

= 

2147483648 

k 

= 

2  A  3  2 

z= 

4294967296 

ak 

= 

1 

ck 

z= 

0 

See  the  MUVES  source  code  Rn/RnLegacy.cpp,  where  and  Ck  for  k  =  2 2 9  are 
documented  as  “magic  beans”. 

2.5  Legacy  Issues 

The  legacy  RNG  fails  most  common  tests  of  randomness.  This  alone  is  a  reason  to 
reject  the  legacy  RNG  outright. 

The  period  of  229  «  5  x  108  is  too  short.  At  a  rate  of  9  million  random  number 
draws  per  second  this  is  exhausted  in  60  seconds.  In  terms  of  how  many  random 
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quantities  MUVES  uses,  consider  an  analysis  with  1000  cells  in  a  view  and  1000 
BAD  fragments  per  shot  using  8  threats  and  8  velocities.  With  9  iterations,  the 
number  of  fragments  has  already  exceeded  the  stream  length. 

The  resolution  of  4  digits  (15  bits)  does  not  adequately  cover  the  range  of  double¬ 
precision  uniform  values  on  the  unit  interval.  Full  IEEE  double  resolution  requires 
15  digits  (53  bits). 

Every  random  quantity  needs  its  own  stream  for  true  independence.  The  8  streams 
available  in  the  legacy  implementation  place  an  unreasonably  low  limit  on  the  pos¬ 
sible  number  of  independent  stochastic  quantities. 

The  legacy  implementation  provides  a  single  set  of  random  quantities,  and  as  such 
is  not  thread-safe  or  suitable  for  parallel  processing.  Independent  shotlines  need 
independent  sets  of  independent  quantities. 

We  need  a  high-quality  fast  64-bit  RNG  with  a  huge  period  that  can  easily  be  parti¬ 
tioned  (into  streams)  for  independent  parallel  processing  and  the  streams  partitioned 
(into  substreams)  for  independent  quantities. 

Such  a  system  is  the  topic  of  the  next  section. 
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3.  Linear  Feedback  Shift  Register 


Collins^ documents  the  details  of  T258,  the  RNG  currently  implemented  in  MUVES, 
based  on  the  linear  feedback  shift  register  (LFSR).  For  clarity  some  information  is 
repeated  in  the  following  along  with  subsequent  developments.  Following  the  defi¬ 
nition,  the  (equivalent)  matrix  and  recurrence  relation  representations  give  concrete 
examples  of  general  FFSR  implementation,  but  MUVES  uses  neither.  Instead,  an 
efficient  algorithm  (QT)  can  be  used  for  the  particular  class  of  FSFRs  in  T258. 

3.1  Definition 

F2  is  the  finite  field  with  2  elements  {0, 1 }  which  are  equivalent  to  bits.  In  F2  addition 
is  subtraction  as  1  +  1  =  0  and  1  =  -1.  Fet  xo,xi,X2,  ■ . .  be  a  sequence  fromF2.  An 
FFSR  sequence  obeys  some  recurrence  with  c;-  6  F2 


k-i 

Xn+k  —  ^  ^  ciXn+i  —  Cfc-lXj j+£-l  +  .  .  .  +  C\Xn+  \  +  CQXn  (15) 

1=0 

so  any  bit  is  determined  by  the  previous  k  bits.  Blocks  of  L  bits  can  yield  L-bit 
integers  z  with  0  <  z  <  2L  via  the  appropriate  output  function 

L- 1 

Z  =  lL~l~ixn+i  =  2  L~lXn  +  ...  +  2°Xn+L-\  (16) 

i—0 

or  real  numbers  u  with  0  <  u  <  1  via  the  output  function  u  =  z/2L 

L-l 

u  =  ^  2 ~'~lxn+i  =  2~lxn  +  . . .  +  2~Lxn+L-\  .  (17) 

i= 0 

3.2  Matrix  Representation 

The  matrix  representation  uses  a  k-bit  nonzero  state  vector  Xn  =  (xnfl,  ■ . . 
and  a  k  x  k  transition  matrix  A,  both  with  components  in  F2.  The  transition  recur¬ 
rence  is  Xn+\  =  AXn.  The  shift  is  xn+\ j  =  xn,i+ 1  for  i  =  0, . . .  ,k  -  2,  and  the  last 
component  x^+i^-t  of  Xn+\  is  determined  by  the  recurrence  of  Eq.  15.  Note  that 
in  general  Xn+k  =  AkXn.  Here,  A  implements  Xk+ 4  =  Xk+2  +  Xk  starting  with  Xq, 
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where 


'  0 

1 

0 

0  ' 

vo 

0 

0 

1 

0 

and  X0  = 

Xi 

0 

0 

0 

1 

x2 

1 

0 

1 

0 

.  *3  . 

Then  X\, . . . ,  X5  are  seen  to  be 


Xi 

x2 

V3 

V2  +  Vo 

v3  +  Vl 

X2 

x3 

V2  +  Vo 

V3  +  Xl 

vo 

x3 

x2  +  x0 

V3  +  Xl 

vo 

Vl 

X2+X0 

_  x3  +  VI 

v0 

Vi 

V2 

(18) 


(19) 


The  matrix  A  has  characteristic  polynomial  P{z)  =  det(z7  -  A)  =  z4  -  z2  -  1. 
Since  P(A )  =  0,  we  have  A4  =  A2  +  I  and  in  general  Ak+ 4  =  Ak+2  +  Ak.  Note  that 
X4  =  X2  +  Xo  and  X5  =  X3  +  X\  and  in  general  Xk+4  =  Xk+2  +  %k-  So  xn  and  X„ 
and  A”  follow  the  same  recurrence. 

We  never  use  the  matrix  implementation. 

3.3  Recurrence  Relations 

Any  matrix  A  has  characteristic  polynomial  (CP) 

C(z)  =  det(z/  -  A)  =  zk  -  Ck-\zk~{ - ciz  -  co  .  (20) 

The  Cayley-Hamilton  theorem  assures  that  C(A)  =  0,  therefore 

Ak  =  ck-\Ak~l  H - +  ciA  +  c07  .  (21) 

C(z)  is  also  the  CP  of  the  recurrence  Xn+\  =  AX„,  consider  Xll+k  =  AkXn,  so 


Xn+k  ~  Ck—\Xn+k—\  +  •  •  •  +  C\Xn+\  +  CqXu  .  (22) 

The  recurrence  gives  the  sequence  of  vectors  Xn,  where  each  c,  6  F2,  and  applies 
to  each  coordinate  (bit)  of  Xn. 

We  never  use  the  recurrence  to  implement  the  RNG. 
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3.4  QuickTaus  Trinomial  LFSR 


L’Ecuyer’s  QuickTaus  (QT)  algorithm^  uses  a  trinomial  recurrence  P(z)  =  zk  - 
zq  -  1  to  generate  a  (horizontal)  bit  stream  in  blocks  of  s  bits  according  to  bn+k  - 
bn+q  +  bn.  Word  size  is  L  bits.  A,  B,  and  C  are  words.  A  is  the  LFSR  state.  C  is  a 
mask  of  k  ones,  then  L  -  k  zeros. 


The  QT  algorithm  updates 

B  =  A  «  q; 

B  =  A  A  B ; 

B  =  B  »  (k-s); 

A  =  A  &  C; 

A  =  A  «  s; 

A  =  A  A  B ; 


//  q-bit  left-shift  of  A 
//  A  xor  B,  bitwise 
//  (k-s) -bit  right-shift  of  B 
//  A  and  C,  bitwise 
//  s-bit  left-shift  of  A 
//  A  xor  B,  bitwise 


The  C/C++  implementation  of  QT  is  succinct. 

C  =  -0x1  «  (L-k) ; 

A  =  (  (  A  &  C  )  «  s  )  A  (  (  (A  «  q)  A  A  )  »  (k  -  s)  ) ; 

Note  the  use  of  the  “twos  complement”  negative  integer  representation.  A  negative 
integer  is  stored  as  the  twos  complement  of  its  absolute  value,  which  is  the  sum  of 
its  ones  complement  (all  bits  reversed)  and  0x1.  So  -0x1  =  0xfff . . .  fff. 


An  example  illustrates  the  action  of  QT.  Consider  P(z)  =  z28  -  z3  -  1,  so  k  =  28 
and  g  =  3.  The  bit  recurrence  is  bn+ 2$  =  bn+ 3  +  bn.  The  block  size  is  s  =  17,  the 
word  size  is  L  -  32.  The  mask  is  C  =  -0x10  =  0xfffffff0,  which  has  k  =  28 
ones  followed  by  L  -  k  =  4  trailing  zeros.  So  the  C/C++  implementation  is 

x  =  (  (x  &  -0x10)  «  17  )  A  (  (  (x  «  3)  A  x  )  »  11  ); 

In  detail,  step  by  step: 

x  ',//(.  x0..x28  ,  x29..x31  )  z°  -  1,  initial  state  satisfies  P(z) 

y  =  x  «  3  ;  //  (  x3..x31  ,  3  @  0  )  z3,  step  forward  3 

y  =  y  A  x  ;  //  (  x28..x56  ,  x29..x31  )  by  the  construction,  z3  +  z°  =  z28 

y  =  y  »  11;  //  (  11  @  0  ,  x28..x48  )  tail=new  block  of  s  bits,  32  to  48 

x  =  x  &  C  ;  //  (  x0..x27  ,  4  @  0  )  make  room  for  new  block  in  x 

x  =  x  «  17;  //  (  xl7..x27  ,  21  @  0  )  pop  s  bits 

x  =  x  A  y  ;  //  (  xl7..x27  ,  x28..x48  )  combine  bitwise:  x+y  mod  2 

This  generates  bits  “horizontally”  in  blocks  of  17,  and  successive  words  are 

x  =  (  x0  ...  xl4  ,  xl5  . . .  x31  ) 

A(x)  =  (  xl7  . . .  x31  ,  x32  . . .  x48  ) 
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3.5  Characteristic  Polynomial 


Note  that  columns  are  “vertically”  leapfrog  (by  17)  sequences  from  P(z)  Using  any 
bit  position  x[b]  vertically  from  the  sequence  x=A(x),  the  characteristic  polyno¬ 
mial  can  be  computed  as  detailed  in  Collins 


28 


cv)  =  E' 


_  „/  _  „28  ,  .  „i/  ,  „i:>  .  „iu  t  _o  ,  .  _z  .  i 

CjZ  —  z  +  z  +  z  +  z  +  z  +  z  +  z  +  z  +  t 


.19 


17 


15 


,10 


(23) 


/=o 


C(z)  is  also  the  characteristic  polynomial,  hence  the  recurrence,  of  the  words  them¬ 
selves.  Only  2k  bits  are  required  for  this  operation,  making  the  LFSR  crypto¬ 
graphically  useless.  From  the  2k  bits  xq,.  . .  ,X2k-i,  construct  the  k  +  1  vectors 
Xi  =  (xi, . . .  ,Xi+k- 1)  each  of  length  k  for  i  =  0, . . .  ,k.  Since  Xk  is  a  linear  combi¬ 
nation  of  the  previous  k  vectors  Xq,  . . .  ,Xk- 1,  the  solution  of  the  linear  system 


Xk 

Xk- 1  ' 

'  X2 

X\ 

xq 

Ck- 1 

%k+ 1 

Xk 

■  X3 

X2 

xi 

Ck- 2 

Xk+ 2 

— 

Xk+ 1 

X4 

X3 

X2 

Ck-3 

X2k-1 

X2k-2  ' 

'  Xk+ 1 

Xk 

Xk- 1 

CO 

(24) 


provides  the  coefficients  of  C(z),  along  with  Ck  =  1  as  qIj  =  qX,-.  The 

Berlekamp-Massey  algorithm  is  an  efficient  algorithm  for  solving  this  system  of 
equations.  Also,  in  each  row  (j  =  0, ...  ,k  -  1)  we  see 


k- 1 


X k+j 


=E 

i=0 


ClX 


1-t-l+J 


(25) 


3.6  State  Details 


P{z)  generates  horizontal  bit  stream  computed  in  17-bit  blocks: 

101000000111100010011010000110100011101111000100101  . . . 
10100000011110001  00110100001101000  11101111000100101  ... 

Shift  to  fill  words,  new  block  on  the  right,  (partial)  old  block  shifted  left: 

010010110000100  10100000011110001 
100000011110001  00110100001101000 
110100001101000  11101111000100101 


Approved  for  public  release;  distribution  is  unlimited. 


11 


C(z)  generates  each  vertical  bitstream  and  also  the  words: 


01001011000010010100000011110001  =  0x4b0940f 1 
10000001111000100110100001101000  =  0x81e26868 
11010000110100011101111000100101  =  0xd0dlde25 

3.7  Jump 

The  division  algorithm  for  polynomials  is  analogous  to  the  integer  division  algo¬ 
rithm:  Divide  zd  by  CAz.)  to  get  the  quotient  Qdiz)  and  remainder  Jd(z),  which 
satisfy  zd  =  Qd(z)C(z )  +  Jd(z)  where  Jd  -  0  or  0  <  deg/rf  <  degC  =  k. 
The  remainder  is  the  jump  polynomial  with  coefficients  in  F2  =  {0,1}:  Jd(~)  = 
zd  mod  CAz)  =  Zfjo  jiZ1-  We  know  C(A)  =  0,  therefore  Ad  =  Jd(A),  and 


k- 1 


(26) 


Thus,  a  future  (jump)  state  is  some  linear  combination  of  only  k  successive  states. 
To  compute  J^p,  start  with  m  -  0  and  apply  p  iterations  of  squaring  mod  C(z) 


(27) 


Each  step  is  an  application  of  the  division  algorithm  via  synthetic  division  in  F2 
using  Knuth’s  algorithm  referenced  in  Collins.^ 

3.8  Jump  Example 

The  state  transition  is  x  =  A(x)  where 

unsigned  A  (  unsigned  x  )  { 

return  (  (x  &  -0x10)  «  17  )  A  (  (  (x  «  3)  A  x  )  »  11  ); 

} 

For  p  =  64,  the  2P  jump  polynomial  for  our  example  is 

J(z)  =  z21  +  z26  +  z25  +  z24  +  z22  +  z19  +  z18  +  z15  +  z11  +  Z9  +  z8  +  z6  +  z5  +  z2  +  1 

J  =  0x0f4c8b65  =  0000  till  0100  1100  1000  1011  0110  01012 

To  apply  to  state  x,  construct  the  F2-linear  combination  of  successive  states: 
unsigned  j,  t=x,  y=0; 

for(j=J ;  j;  j»=l,  t=A(t))  if  C  j  &  1)  y  A=  t; 


Then  y  =  A2P x  =  A26\  =  A18446744073709551616x. 
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State  vectors  and  polynomials  coefficients  (0  or  1)  are  packed  into  integers. 
XOR  (A)  is  F2  vector  space  addition. 


4.  T258 

In  practice,  a  single  LFSR  is  not  good  enough.  MUVES  implements  the  RNG  al¬ 
gorithm  T258,-  an  extension  of  L’Ecuyer’s  LFSR 25 8-  providing  probabilistically 
independent  sets  of  random  vectors  and  suitable  for  parallel  processing  (thread- 
safe). 


T258  uses  a  set  of  five  64-bit  LFSRs,  each  implementing  a  recurrence  defined  by  a 
primitive  polynomial  of  the  form  P(z)  =  zk  -  zq  -  1  in  bit  blocks  of  size  5  by  means 
of  the  QT  algorithm. 


Each  LFSR  is  implemented  with  QT  transition 

x  =  (  (  x  &  C  )  «  s  )  A  (  (  (x  «  q)  A  x  )  »  (k  -  s)  ) ; 

The  QT  parameters  for  the  5  components  of  T258  are 


# 

L 

k 

q 

s 

k-s 

L-k 

M  =  -C 

0 

64 

63 

1 

10 

53 

1 

0x000002 

1 

64 

55 

24 

5 

50 

9 

0x000200 

2 

64 

52 

3 

29 

23 

12 

0X001000 

3 

64 

47 

5 

23 

24 

17 

0x020000 

4 

64 

41 

3 

8 

33 

23 

0x800000 

The  LFSR  periods  are  Pj  =  2ki  -  1  for  k  e  {63,55,52,47,41}, 


The  Pj  are  pairwise  relatively  prime,  so  the  period  of  T258  is  P  —  nf=1  Pi  ~  2258. 


4.1  Implementation 


For  consistent  presentation  hereafter,  we  use  a  C++  class  to  implement  T258.  Com¬ 


plete  current  code  is  in  Section  9.  This  is 
algorithms  are  identical. 

typedef  uint64_t  uz; 
class  rng  { 
public : 

rngCuz  seed); 
rng&  init() ; 
rng&  t(); 
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//  T258 

II  ... 

/ /  constructor 
//  (re)initialize 
//  state  transition 
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rng&  jump(uz  p) ;  //  jump  2Ap 


rng&  jumpk(uz  k,  uz  p) ; 
uz  gen() ; 
double  U01Q; 
static  const  int  IS  =  86; 
static  const  int  JL  =  172; 
bool  operator != (const  rng& 
bool  operator== (const  rng& 
private : 

uz  s  [  5 ]  ; 
uz  seed; 

void  seedx(void) ; 
static  const  int  CPd[5] ; 
static  const  uz  JP [63]  [5]  ; 


//  jump  k*2Ap 
//  integer  generator 
//  u(0, 1)  generator 
//  log_2  (  small  jump  ) 
//  log_2  (  large  jump  ) 
w) ;  //  state  comparison 
w) ;  //  state  comparison 
//  ... 

//  LFSR  state  array 
//  this. seed 
//  set  seed 
//  CP  degrees 
//  jump  polynomials 


}; 


Representation  of  the  T258  state  transition  is 

rng&  rng::t()  {  //  T258:  C  s  q  k-s 

s [0]  =  C(s[0]  &  -0x000002)  «  10)  A  CCCs [0]  «  1)  A  s [0] )  »  53) 

s[l]  =  C Cs [1]  &  -0x000200)  «  5)  A  CCCs [1]  «  24)  A  s [1] )  »  50) 

s  [2]  =  C Cs [2]  &  -0x001000)  «  29)  A  CCCs [2]  «  3)  A  s  [2] )  »  23) 

s  [3]  =  C Cs [3]  &  -0x020000)  «  23)  A  CCCs [3]  «  5)  A  s [3] )  »  24) 

s  [4]  =  C Cs [4]  &  -0x800000)  «  8)  A  CCCs [4]  «  3)  A  s [4] )  »  33) 

return  *this; 

} 

The  64-bit  unsigned  integer  function  genQ  returns  values  in  0, . . .  ,264  -  1. 

uz  rng::gen()  {  //  64-bit  integer 

t();  //  transition 

return  s[0]  A  s [ 1]  A  s [2]  A  s [ 3 ]  A  s[4]; 


The  function  u01  C)  is  a  discrete  version  of  the  continuous  U (0, 1 )  distribution. 

double  rng::u01()  {  //  U(0,1) 

uz  z ; 

const  static  double  d  =  1 . 0/(lUL«53)  ;  //  1/2 A 5 3 
do  z  =  gen()  »  11;  while(!z); 
return  d  *  z; 
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4.2  The  Uniform(0,1)  Distribution 


The  real  function  u01()  uses  53  bits  of  the  full  64-bit  integer  to  compute  a  discrete 
version  of  the  continuous  U(0, 1)  distribution  with  evenly-spaced  equiprobable  out¬ 
put  i/253  for  i  =  1,2,3, . . . ,253  -  1.  These  are  the  expected  values  of  U{ 0, 1)  order 
statistics  for  a  sample  of  size  253  -  1  and  yield  the  correct  discrete  approximation. 
Note  the  symmetry  in  the  sense  that  u01()  and  1-U01O  have  the  same  distribution. 
In  fact,  53  is  the  largest  number  with  properties,  as  the  internal  IEEE  representation 
of  double  uses  53  bits  implicitly. 

It  may  be  tempting  to  obtain  more  “accuracy”  by  using  all  64  bits  and  dividing 
by  264,  but  this  is  an  error.  The  resulting  distribution  will  not  have  equiprobable 
evenly-spaced  values  and  will  not  be  symmetric.  Eventually,  some  such  x  >  0  will 
be  so  small  that  1  -  x  =  1.  The  program  will  crash  on  something  like  log(l  -  x) 
expecting  that  both  0  <  x  <  1  and  0  <  1  -  x  <  1 . 

4.3  Initialization 

L’Ecuyer-  states  a  condition  required  for  QT,  that  the  initial  state  must  be  a  valid 
recurrence  element: 

“For  this  algorithm  to  work  properly,  A  must  be  initialized  correctly  with  a  valid 
initial  .So :  that  is,  which  agrees  with  the  recurrence. 

[  General  initialization  algorithm  omitted.  ] 

If  the  additional  condition  L-k^r-sis  satisfied,  then  it  can  be  easily  verified 
that  after  the  first  pass  through  the  six  steps  of  QuickTaus,  A  will  necessarily  con¬ 
tain  a  valid  state,  even  if  the  initial  state  So  was  not  valid.  In  that  case,  the  above 
initialization  procedure  is  not  necessary  for  running  the  generator;  just  skip  the  first 
value.” 

For  T258,  the  additional  condition  is  satisfied  since  r  =  k  -  q,  the  general  initial¬ 
ization  algorithm  is  not  required,  and  correct  operation  is  obtained  by  “skipping  the 
first  value”,  implemented  in  the  initialization  code  as  a  single  call  to  QT  with  its 
return  value  discarded. 

Without  this  initialization  feature,  computation  of  the  C(z)  fails.  The  results  are 
random  depending  on  the  (incorrect)  initial  state.  Consequently,  computation  of  the 
J(z )  also  fails,  since  these  depend  on  the  C(z).  In  the  current  implementation  the 
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C(z )  are  not  used,  and  the  J(z)  are  pre-computed  and  stored  in  tables,  so  this  is  not 
an  issue.  However,  jump  computation  fails  even  with  correct  J(z). 

Moreover,  if  the  implementation  is  ever  changed  to  compute  new  J{z )  for  different 
jump  sizes,  include  dynamic  computation  of  C(z)  and  J(z),  replace  the  RNG,  or 
perform  diagnostic  tests  or  verification  and  validation,  etc.,  the  condition  is  neces¬ 
sary  or  else  the  computations  will  fail. 

4.4  Seeding 

L’Ecuyer-  presents  conditions  for  correct  seeding  of  LFSR  RNGs. 

“Before  calling  lfsrll3  for  the  first  time,  the  variabies  zi,  Z2,  zj,  and  Z4  must  be 
initialized  to  any  (random)  integers  larger  than  1,  7,  15,  and  127,  respectively.  In 
other  words,  the  kj  most  significant  bits  of  zj  must  be  nonzero,  for  each  j. 

Ideally,  the  vector  of  initial  seeds  (zi,  ■ . . , Zj )  would  be  drawn  from  a  uniform  dis¬ 
tribution  over  the  set  of  admissible  values.” 

Minimum  values  for  T258  seed  states  are  denoted  as  M  =  -C  in  the  parameter 
table  on  page  13.  If  x  <  M  then  QT{x )  =  0,  and  the  LFSR  is  stuck  at  0.  This  is 
known  as  the  sink  condition.  Thus,  only  seed  values  x  with  x  >  M  are  admissible, 
and  ideal  seed  values  x  are  uniform  random  with  x  >  M.  Note  that  C  =  —M  is  the 
QT  mask  value. 

4.5  Parallel  Processing 

Partition  RNG  with  period  P  «  2°  into  2s  independent  streams  of  length  2°~s 
for  parallel  processing  or  shotlines.  Partition  each  stream  into  2B  independent  sub¬ 
streams  of  length  2g~5~b,  the  effective  “period”  of  any  scalar  RNG  for  independent 
variables. 

T258  has  G  =  258,  and  MUVES  uses  5  =  B  =  86,  so  G  -  5  =  172  and  G  -  5  -  B  = 
86.  The  global  RNG  is  seeded  once  to  set  base  state.  Then  286  streams  are  separated 
by  long  jumps  of  2172  from  the  base.  In  each  stream,  286  substreams  are  separated 
by  short  jumps  of  286,  the  effective  substream  “period.” 
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4.6  RunTime 


Suppose  numbers  are  generated  at  a  rate  of  1  billion,  or  about  230,  per  second. 

The  legacy  LCG  with  a  single  stream  of  length  232  has  8  substreams  of  length  229. 
Each  will  run  for  0.5  seconds,  and  the  full  cycle  runs  for  4  seconds. 

One  year  «  602  •  24  •  365  «  224-91  «  225  seconds. 

For  T258:  a  substream  of  length  286  will  run  for  ~  286-30-25  =  231  «  2  billion  years, 
a  stream  of  length  2172  will  run  for  ~  2172-30-25  =  2117  «  1026  billion  years,  and  a 
full  cycle  of  length  2258  will  run  for  ~  2258-30-25  =  2203  w  1052  billion  years. 

Brute  force  verification  is  impractical.  The  numbers  are  large: 

2258  =  463 1683569492647S 169428394003475 163 1413079938662562256 15783033603 16525 18559744  »5x  1077 
P  =  463168356949050750352076184268918090343706927944462529355293134289296410279935  *  5  x  1077 
2258  _  p  = 214031342207755765833541069373010718099726802537201742356108279809  w  2  x  1065 
2172  =  5986310706507378352962293074805895248510699696029696  *  6  x  1051 
286  =  77371252455336267181195264  as  8  x  1025 
P  <  2258  so  the  number  of  full  streams  is  not  286  but  floor(P/2172)  =  77371252455300513717551104  as  8  X  1025 

number  of  streams  “lost”  =  35753463644160  as  4  X  1013 

A  useful  approach  to  verification  follows  from  SCR  2138,  Section  7. 

5.  SCR  1049-1050 


This  following  is  equivalent  to  the  initial  2008  implementation  of  T258  in  MUVES 
as  presented  in  SCR  1049  “Replace  existing  Uniform  random  number  generator” 
and  SCR  1050  “Provide  independent  RNG  streams  for  DMUVES”. 

The  constructor  saves  the  seed  and  initializes  the  system. 

rng::rng(uz  seed)  { 

this->seed  =  seed; 

initO ; 

} 

The  initializer  seeds  the  LFSRs  and  invokes  a  single  transition. 

rng&  rng : : init()  { 
seedx() ; 
tO; 

return  *this; 

} 
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The  original  seeding  routine  used  a  single  seed  integer  and  an  auxiliary  LCG  of  the 
form  xi+ 1  =  c  ■  Xi  to  obtain  5  uniformly-distributed  seed  values. 

void  rng::seedx()  { 
uz  c  =  69069; 


s [0]  =  c*seed 

if(s[0] 

< 

2) 

s  [0] 

+= 

2u 

s [1]  =  c*s[0] 

if Cs [ 1] 

< 

512) 

s[l] 

+= 

512u 

s [2]  =  c*s[l] 

if Cs [2] 

< 

4096) 

s  [2] 

+= 

4096u 

s [3]  =  c*s[2] 

if Cs [3] 

< 

131072) 

s  [3] 

+= 

131072u 

s [4]  =  c*s[3] 

i f  Cs [4] 

< 

8388608) 

s  [4] 

+= 

8388608u 

} 

The  system  uses  an  array  of  65  polynomials  “const  uz  rng: :  JP  [65]  [5]”  for  jumps 
of  IP  where  p  =  (86, 172, 173, 174,. ..  ,233,234,235  =  172  +  63).  MUVES  uses 
4  independent  substreams  (for  variables,  separated  by  286)  in  each  thread  stream 
(separated  by  2172). 

Generators  for  stream  k  with  0  <  k  <  264  are  obtained  by  applying  k  large  jumps 
of  size  2172  using  the  binary  decomposition  of  k  followed  by  small  286  jumps  for 
substreams.  Note  that  if  the  nth  bit  bn  of  k  -  is  nonzero,  then  jp[n+l] 

can  be  used  to  advance  each  LFSR  by  2171+'7  =  2"  •  2172  for  total  offset  of  k  ■  2172. 
This  is  efficient  even  for  large  k ,  unlike  iterating  the  2 172  jump  k  times.  Then  each 
substream  i  is  advanced  by  i  ■  286  to  obtain  offsets  of  k  ■  2172  +  i  ■  286  for  i  =  0, 1,2,3. 

6.  SCR  1908 


In  2014,  SCR  1908  “Changes  to  Tausworthe  T258  Random  Number  Generator” 
realized  these  modifications: 

Description:  The  seeding  routine  in  T258  is  more  restrictive  than  necessary  and  it 
makes  a  call  to  the  generator  itself  before  returning.  This  prevents  setting  the  five 
states  to  the  same  seed  and  making  a  call  to  the  generator  introduces  confusion 
when  comparing  to  the  standalone  behind-armor  debris  model. 

and  the  document  scr .  pdf  referenced  in  the  SCR  contains: 

•  Seeding  routine  is  more  restrictive  than  necessary,  in  two  respects: 

-  Seed  is  multiplied  by  69069  each  time  an  internal  state  is  set. 

-  Makes  a  call  to  the  RNG  itself  before  the  seeding  procedure  returns. 


The  current  seeding  routine  (see  Figure  1  [omitted])  multiplies  the  seed  by  69069 
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each  time  that  the  five  states  of  the  RNG  are  set.  There  is  no  need  to  do  this,  and  it 
prevents  us  from  setting  the  five  states  to  the  same  seed. 

The  initialization  method  implemented  by  SCR  1908  is  equivalent  to: 

rng&  rng : : init()  { 
seedxO  ; 
return  *this; 

} 

The  seeding  method  implemented  by  SCR  1908  is  equivalent  to: 
void  rng::  seedxO  { 

s [0]  =  seed;  if(s[0]  <  2)  s [0]  +=  2u; 

s [ 1]  =  seed;  if(s[l]  <  512)  s [ 1 ]  +=  512u; 

s [2]  =  seed;  if(s[2]  <  4096)  s [2]  +=  4096u; 

s [3]  =  seed;  if(s[3]  <  131072)  s [3]  +=  131072u; 

s [4]  =  seed;  if(s[4]  <  8388608)  s [4]  +=  8388608u; 

} 

6.1  Consequences 

6.1.1  Jump  Computation  Failure 

Omission  of  the  valid  initial  state  criterion  of  Section  4.3  (by  removing  the  initial¬ 
ization  call  to  the  RNG)  introduces  the  problems  presented  in  that  section.  This 
includes  the  failure  of  jump  computation  even  for  correct  jump  polynomials. 

The  requirement  for  statistically  independent  random  substreams  across  all  streams 
is  essential  for  the  validity  of  the  simulation.  This  is  obtained  by  partitioning  the 
overall  cycle  into  non-overlapping  segments,  guaranteed  by  correct  jump  compu¬ 
tations.  When  the  jump  computations  fail,  the  resulting  possible  overlap  makes  the 
claim  of  independence  invalid. 

6.1.2  Stream  independence  Failure 

Use  of  the  same  single  seed  for  all  5  LFSRs  (instead  of  uniform  random  seeds 
derived  from  a  single  value)  introduces  another  problem. 

Put  simply,  if  you  run  a  simulation  with  seed  x  one  day,  and  seed  x  +  l  the  next 
day,  the  results  will  not  be  independent  (as  is  required  for  a  random  sample).  Se¬ 
quentially  seeded  cycles  are  not  independent.  If  the  seeds  themselves  are  generated 
by  some  random  process  (clock  time,  process  id,  /dev/random,  /dev/urandom,  ra¬ 
diation,  etc)  this  is  likely  not  an  issue.  But  if  a  human  needs  3  seeds  for  a  random 
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sample  of  runs,  she/he  just  might  pick  something  like  666,  667,  and  668.  Then  the 
following  problem  manifests. 

A  number  is  Borel  normal  in  base  r  if  every  sequence  of  k  symbols  in  the  letters 
0, 1, . . .  ,r  -  1  occurs  in  the  base-r  expansion  of  the  given  number  with  the  expected 
frequency  r~k.  Uniform  random  numbers  are  Borel  normal. 

Let  x  be  a  64-bit  (r  =  2)  uniform  random  integer  and  b(x)  =  the  number  of  Is  in  x. 
Then  b(x)  ~  5(64, 1/2),  where  B(n,p )  is  the  binomial  distribution  and  q  =  1  —  p. 
Here,  n-  64  and  p  =  q  =  1/2.  Then  E  b{x)  =  np  =  32  and  Var  b(x)  =  npq  =  16. 


If  (jCjO^p  is  a  sequence  of  such  x ,  then  the  b(xi)  are  iid  B(n,p),  asymptotically 
normal 

b(xi )  ~  N(np,npq )  (28) 


If  (jt/iO^j  for  j  =  1,. . .  ,k  are  k  such  sequences,  then  the  5)  =  {xy;  :  j  =  1 _ ,k} 

are  iid  random  samples  of  size  k  from  B(n,p).  Then  the  sample  means  m,  = 
mean (6))  =  j-  Xy=i  b(xp )  are  iid  asymptotically  normal 


m,- 


A 


(29) 


and  the  sample  variances  v, 
cally  normal 

Vf  ~ 


=  var (S))  =  \  Yj)=  i  (%)  -m/)’ 


N  \  npq,(npqY 


k-  1 


are  iid  asymptoti- 
(30) 


where  k  is  the  excess  kurtosis 


1  -  6 pq 

k  = - 

npq 


(31) 


Graphs  follow  for  i  =  1,. . .  ,500  and  k  =  1000,  where  k  sequential  seeds  were 
chosen.  First  we  see  the  series  results  for  seeds  0  through  999,  offset  0,  to  the  same 
scale  in  Fig.  1  and  with  the  correct  Q  method  magnified  to  show  detail  in  Fig.  2. 
Then  in  Figs.  3  and  4  we  see  the  same  presentation  for  an  arbitrary  offset  s0,  for 
seeds  s0  through  s()  +  999.  Then,  without  regard  to  the  series,  we  see  the  cumulative 
distribution  functions  (CDFs)  for  offset  0  in  Fig.  5  and  offset  s0  in  Fig.  6. 
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M:  b,  s  =  0  +  sO 


M:  m,  s  =  0  :  999  +  sO 


M:  v,  s  =  0  :  999  +  sO 


Q:  b,  s  =  0  +  sO 


Q:  m,  s  =  0  :  999  +  sO 

40  - 

35  - 

30  - 

25  - 

- 1 - 1 - 1 - 1 - 1 

0  100  200  300  400  500 


Q:  v,  s  =  0  :  999  +  sO 


Fig.  1  b  series,  no  offset,  same  scale 
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Q:  b,  s  =  0  +  sO 


M:  b,  s  =  0  +  sO 


M:  m,  s  =  0  :  999  +  sO 


Q:  m,  s  =  0  :  999  +  sO 


M:  v,  s  =  0  :  999  +  sO 


Q:  v,  s  =  0  :  999  +  sO 


Fig.  2  b  series,  no  offset,  detail 
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M:  b,  s  =  0  +  sO 


Q:  b,  s  =  0  +  sO 


M:  m,  s  =  0  :  999  +  sO 


Q:  m,  s  =  0  :  999  +  sO 


M:  v,  s  =  0  :  999  +  sO  Q:  v,  s  =  0  :  999  +  sO 


Fig.  3  b  series,  arbitrary  offset,  same  scale 
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M:  b,  s  =  0  +  sO 


Q:  b,  s  =  0  +  sO 


Fig.  4  b  series,  arbitrary  offset,  detail 
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Fig.  5  b,  m,  v  CDFs,  no  offset 
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Fig.  6  b,  m,  v  CDFs,  arbitrary  offset 
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7.  SCR  2138 


In  2015,  SCR  2138  “MUVES  Random  Number  Generator  initialization  correction” 
remediated  the  effects  of  SCR  1908.  The  initialization  generator  call  was  reinstated 
to  assure  correct  jump  computations  and  statistical  independence  of  everything. 

rng&  rng : : init()  { 
seedxO  ; 
tO; 

return  *this; 

} 

An  improved  seeding  scheme  uses  a  better  LCG  that  SCR  1049,  available  in  L’Ecuyer.- 
This  reestablishes  the  independence  of  sequentially- seeded  streams  lost  in  SCR 
1908. 


void  rng:: seedxO  {  //  set  seed 

uz  c  =  0x27bb2ee687b0b0fd,  d  =  0x891087b8e3b70cbl ; 
do  s[0]  =  c*seed+d;  while(s[0]  <  0x000002); 

do  s [ 1 ]  =  c*s[0]+d;  while(s[l]  <  0x000200); 

do  s[2]  =  c*s[l]+d;  while(s[2]  <  0x001000); 

do  s [3]  =  c*s[2]+d;  while(s[3]  <  0x020000); 

do  s[4]  =  c*s[3]+d;  while(s[4]  <  0x800000); 

} 

The  final  element  of  SCR  2138  implements  an  improved  algorithm  for  jump  com¬ 
putation  based  on  the  identity  of  Section  7.1.  Previous  computations  are  a  “special 
case”  of  the  new  algorithm,  in  that  the  results  are  identical.  The  old  algorithm  could 
compute  jumps  of  size  286  and  k2112  for  k  =  0, . . .  ,264  -  1.  The  new  algorithm  is 
more  transparent  but  capable  of  jumps  k  2P  for  both  p  and  k  in  {0, . . . ,  264  -  1 }. 

In  either  case,  suppose  that  t()  is  a  single-step  state  transition  and  that  j(p)  advances 
the  RNG  state  by  2P  steps.  With  the  old  algorithm,  one  would  check  that  either  286 
applications  of  /()  or  one  application  of  j (86)  to  some  initial  state  results  in  the 
same  final  state.  At  a  rate  of  1  billion  per  second  this  would  take  2  billion  years. 
For  j  ( 172),  this  would  take  1035  years.  A  major  benefit  of  the  new  formulation 
is  that  the  computations  can  be  verified  more  efficiently  due  to  the  availability  of 
arbitrary  j{p).  One  need  only  check  that  /()  and  /(())  give  the  same  state  and  for  p  = 
1,2,3,. . .  ,/»max  that  2  applications  of  j(p  -  1)  give  the  same  state  as  1  application 
of  j(p),  since  2  •  2/,~l  =  2P.  For  pmax  =  236  it  takes  about  5  milliseconds  to  check 
that  any  such  j(p)  is  correct. 
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7.1  Identity 


The  LFSRs  in  T258  generated  by  polynomials  of  the  form  P(z)  =  zk  +  zq  +  1  in 
blocks  of  size  s  have  periods  m  =  2k  -  1.  By  the  division  algorithm  a  jump  size 
of  2 p  can  be  written  as  TP  -  Qm  +  2 p%m  for  integer  Q,  thus  2P  and  2 p%m  are  the 
same  jump. 

In  fact,  2 P%m  =  2p%k,  so  only  the  jumps  2P  for  p  =  0, 1,2,3, . . .  ,k  -  1  for  each 
LFSR  are  required  to  compute  a  T258  jump  of  2P  for  any  p  >  0. 

Formally,  we  assert  that 

2p%(2k  -  1)  =  2p%k  for  k  >  2  and  p  >  0  .  (32) 


Proof: 

Up  <  k  then  2P  <  m  and  2 ,p%m  =  2P  =  2p%k. 

If  p  =  k  then  2P  =  m  +  1  and  2 P%m  =  1  =  2p%k. 

Suppose  the  claim  holds  for  some  p  >  k  and  all  p'  <  p,  and  consider  p  +  1: 
2P+ 1  %m  =  (2  •  2 p)%m  =  ((2 %m)  ■  (2 p%m))%m 
=  (2  •  2 p%k)%m,  by  induction,  as  it  holds  for  p 

_  (2p%k+l)%m  =  2(p%k+V)%k ,  by  induction,  as  p%k  +  1  <  k  <  p 

_  1  _  ^(p+l^ck  QED 

If  k  =  1,  then  m  =  2k  -  1  =  1  and  2P%1  =  0  p  2P%1  =  2°  =  1.  So  k  >  2  is  required. 
Note  that  various  steps  in  the  proof  also  rely  on  k  >  1. 

7.2  Implementation 

T258  characteristic  polynomial  degrees  are 

const  int  rng::CPd[5]  =  {  63,  55,  52,  47,  41  }; 

Only  the  polynomials  for  2°  through  262  ,  254  ,  251,  246,  and  240,  are  required  for  the 
5  LFSRs,  respectively.  Then  any  jump  2P  for  p  =  0, . . .  ,264  -  1  can  be  computed  by 

rng&  rng : : j ump (uz  p)  {  //  jump  2Ap 

uz  i,  j,  a [5]  =  {  0  }; 
for (i=l ;  i;  i«=l,  t()) 
for (j=0 ;  j<5 ;  j++) 

if(JP[p  %  CPd[j] ] [j]  &  i)  a[j]  s[j] ; 
memcpy(s,  a,  5*sizeof (uz)) ; 
return  *this; 
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} 

Any  jump  of  size  k2p  can  be  computed  by 

rng&  rng : : j umpk (uz  k,  uz  p)  {  //  jump  k*2Ap 

for(;  k;  k»=l,  p++)  if(k  &  1)  jurap(p); 
return  *this; 

} 

which  uses  the  binary  representations  k  =  £ -”0  bj 2l  =  bm2m  +  •  •  •  +  bo2°  and 
k2p  =  z;=0  bili+P  =  bm2m+p  +  ■■■  +  bo  2p. 

Increments  for  small  (variable)  and  large  (stream)  jumps  are 

static  const  int  JS  =  86;  //  log_2  (  small  jump  ) 

static  const  int  JL  =  172;  //  log_2  (  large  jump  ) 

Stream  k  can  be  set  up  by 
rng  *z[4] ; 

for(int  i=0;  i<4;  i++)  { 
z[i]  =  new  rng (seed); 

z[i]->jump(k,  rng : : JL) . jump(i ,  rng::JS); 

} 

7.3  Jump  Verification 

These  operators  compare  T258  state  arrays. 

bool  rng :: operator != (const  rng  &w)  {  //  state  inequality 
uz  e  =  0; 

for(int  i=0;  i<5;  i++)  e  |=  s[i]  A  w.s[i]; 
return  e; 

} 

bool  rng :: operator== (const  rng  &w)  {  //  state  equality 
return  !  (  *this  !=  w  ); 

} 

Then  success  of  the  following  code  verifies  jump  computations  for  the  given  seed 
for  all  jumps  2P  with  p  =  0, 1,2,3, . . .  ,pmax. 

void  Vjump(uz  seed,  uz  pmax)  {  //  inductive  jump  verification 

rng  z0(seed),  zl(seed);  //  T258  objects,  same  state 
cout  «  "VjumpC"  «  Px(seed)  «  ",  "  «  dec  «  pmax  «  ") :  "; 
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if(  z0.jump(0)  !  =  zl.tO  )  {  //  base 
cerr  «  "fail  base  check"  «  endl; 
exit(l) ; 

} 

for(uz  p=l;  p<=pmax;  p++  )  //  induction 

if(  z0 . init() . jump(p)  !=  zl .  initO  .  jump(p-l)  .  jump(p-l)  )  { 
cerr  «  "fail  induction  check,  p="  «  dec  «  p  «  endl; 
exit(l) ; 

} 

cout  «  "for  all  p  =  0:"  «  pmax  «  "  jump(p)  ==  tA(2Ap)"  «  endl; 

} 

This  checks  that  jump(0)  is  a  single  transition  t()  and  inductively  that  jump(p) 
is  equivalent  to  2  applications  of  jump(p-l),  as  2P  =  2  •  2P~ 1 .  Upon  success,  this 
establishes  that  jump(p)  implements  a  jump  of  2P  for  p  =  0, 1,2,3, . . .  ,pmax- 

8.  SCR  2142 

Released  in  2015,  SCR  2142  “Modernize  Rn  package  to  C++  and  STL  use”  reim¬ 
plements  the  RNG  system  in  C++  and  provides  code  cleanup. 

The  redundant  old  jump  code  (available  as  an  option  in  SCR  2138)  was  removed, 
leaving  only  the  new  SCR  2138  code.  From  the  SCR  text: 

For  SCR2138,  new  jumping  code  was  added,  but  the  old  jumping  code  was  left  in 
place  because  it  was  not  understood  that  the  new  jumping  code  is  able  to  provide 
the  same  functionality.  The  old  jump  code  will  be  removed  as  part  of  this  SCR  as 
the  Rn  code  is  cleaned  up. 

SCR  2142  also  suggests  exploiting  the  Standard  Template  Library  (STL): 

Note  that  the  C++  STL  contains  classes  for  random  number  generation  and  it  is 
possible  to  adapt  the  T258  RNG  (as  a  random  number  engine)  for  this  use.  The  C++ 
STL  contains  a  complete  set  of  random  distributions  and  it  is  highly  desirable  to 
use  these  instead  of  reimplementing  all  the  distributions  in  MUVES  when  standard 
versions  already  exist.  Therefore,  use  of  the  STL  should  be  investigated  as  part  of 
this  SCR  (both  SCR2061  and  SCR2105  will  require  C++1 1  and  the  C++1 1  STL). 

The  new  C++ 11  standard  library  random  number  generation  feature  is  exposed 
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by  including  the  <random>  header.  At  this  time,  the  MUVES  development  and 
distributions  platforms  do  not  provide  C++ 11  compilers.  So  incorporation  of  the 
RNG  system  into  the  C++1 1  <random>  library  framework  has  been  deferred. 

9.  Demonstration  and  Verification 

The  code  t258.h  in  Section  9.1  is  a  complete  C++  class  implementation  of  T258 
equivalent  to  the  current  SCR  2138  MUVES  version  with  seeding,  initialization, 
state  transition  t(),  integer  and  real  U (0, 1 )  generators,  and  functions  jump(p)  and 
jumpk(k,p)  for  jumps  of  2P  and  k 2P,  respectively,  where  p  and  k  are  64-bit  un¬ 
signed  integers  in  the  range  0, . . . ,  264  -  1 . 

The  driver  code  in  Section  9.2  implements  the  class  in  a  working  program.  The 
driver  uses  a  single  seed,  0xd8940e83ec602c7b  =  15606114568514841723. 

The  function  Vtab  generates  tables  of  integer  and  real  output  to  demonstrate  the 
stream  and  substream  jumps  used  in  MUVES.  The  resulting  tables  on  pages  32 
and  33  were  provided  to  MUVES  developers  as  a  sanity  check  and  were  incorpo¬ 
rated  into  MUVES  unit  testing. 

The  function  Vjump  verifies  jump  computation  for  p  =  0, . . . ,  236.  The  largest  jump 
possible  in  the  current  MUVES  implementation  has  k  =  264  -  1  and  p  =  172,  so 
k2p  <  264  •  2 172  =  2236.  On  success,  one  sees 

Vjump(0xd8940e83ec602c7b,  236):  for  all  p  =  0:236  jump(p)  ==  tA(2Ap) 

On  failure,  one  sees  either 

Vjurap(0xd8940e83ec602c7b,  236):  fail  base  check 

if  jump(0)  is  not  the  same  as  t(),  or  something  like 

Vjurap(0xd8940e83ec602c7b,  236):  fail  induction  check,  p=7 

if  jump(p)  .  jumpCp)  is  not  the  same  as  jumpCp+1)  for  some  p.  The  latter  occurs  if 
the  transition  call  is  removed  from  the  initialization,  as  in  SCR  1908. 

Of  course,  success  for  a  single  seed  is  not  proof.  But  similar  code  had  no  failure 
with  billions  of  different  random  seeds,  which  is  not  proof  either  but  provides  reas¬ 
surance  that  things  are  working. 
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Jump  demonstration  table,  integer: 


Vtab(0xd8940e83ec602c7b,  uz) :  jump  =  J*2A172  +  K*2A86 
J  =  0 

n  K=0  K  =  1  K=2  K  =  3 

1  0xcl6e6cacbl0aafb7  0x8698d7ee73267dfd  0x3b24e9553fe2ab49  0xb501el7f9a22cb5b 

2  0x4197d305ac8dl4d7  0xa6ba3076faf9803d  0x377b2a9c373cd544  0x51al5d9dfcla0£b9 

3  0xad851fdfedb01c77  0xffa700clf6894ab6  0xlc8afleff5a013e0  0x641f61b7c5206628 

4  0xblc4cld91fd598d0  0x4b603c781fcc564a  0x4d016dac0el8f0cf  0xc7b5e5cfb78eca78 

5  0xel5f23003cf4a5bd  0x25ed9bbeld862fl7  0xfb244937cel81286  0xc354£0d48b67a904 

J  =  1 

n  K=0  K  =  1  K=2  K  =  3 

1  0x4c6be74635dd95f9  0x8da677eld0cl02fc  0x5£94da92d55163£c  0xba2853e0ae2053ed 

2  0x37a4ffc68991c454  0x947f071f49deec2a  0x273c05a49c944ba5  0xlc21fl90185d6326 

3  0xb9£3d6e556357739  0xee429bf79386d48a  0x435a62d0al3dd575  0x55622bd6b0745921 

4  0x592 30062 54cd0202  0x3a0987eaf69515c0  0xd95a9d3985026c3a  0x8a0a27de969e2d40 

5  0xbf 1407 lad4ff 5297  0xe93cldaf844af7cf  0x9032a81041006403  0x5213f9f085bd25e8 

J  =  2 

n  K=0  K  =  1  K=2  K  =  3 

1  0x71590ce85a42d74d  0x97699306050bd5 14  0x8d571al3e87d55a9  0x4f21c38777b5flbb 

2  0xde5ff97d8853eea7  0xc221a247176e77d4  0xda7586557e84eaal  0x64330c2eb£01£e85 

3  0x280615a8b54284f 1  0x5d33a5b5585a7364  0xa25f2c70d63f7e4e  0x861a46a5ff29b9e0 

4  0xa5£63cacd6e41d91  0x2c0bc7999bb29££3  0x617c53d£2a932bc8  0x66992fc9cla53027 

5  0xa3c79c2c534b27b4  0x222 lba242669c£cb  0x2fdd232165791dld  0x3888d26261dce6c2 

J  =  3 

n  K=0  K  =  1  K=2  K  =  3 

1  0xff71ca0b732c4949  0x867dl29cd858c058  0xa6ec51930d817e66  0x9dabc7e737ed494f 

2  0xbc77a72aaf89753e  0x9d61dl4749155805  0x588ae0b6b281981f  0xc7bcl546bc2309c3 

3  0x2de469f3f7ddede3  0xlf4e610a2f7263e0  0xd71668e4417bdffc  0x766dbeb7f 102b3a7 

4  0x££672a0c36125e2c  0xbl0593560a59ecd2  0xl52646fe75348e59  0xf63d8f4fe717a0b9 

5  0x9159bb906bebcb03  0xa7a301£2ed7£ec8d  0x£4£791cbf9bc6989  0xdfbb782b73b6ae5d 

J  =  4 

n  K=0  K  =  1  K=2  K  =  3 

1  0xd9£9bddd££303202  0x0173a3a4£2254c51  0xbfd8aaffdl9a665e  0x58a927c37a3602d3 

2  0x86d7897dbl55a745  0x5bdf4b8032d914£9  0x4e09bc87a5c7922c  0x83c947bdc7083b94 

3  0xe819eb79b2aa8d0a  0xa8a74e7e6d0cl038  0x505701b38a6bfb90  0x62c86ca£c75dc91c 

4  0xl4c91fcd7c608b£c  0xdb8£97503a9dl574  0x562al80930c36f54  0xbl02e6e501b83485 

5  0x52785e23c9d880ed  0xbbfc72e609c96ba3  0x4e5031d2d4a0f437  0x9e43932dfac8e9dc 

J  =  5 

n  K=0  K  =  1  K=2  K  =  3 

1  0x0cl4a3a5d302c989  0x660abd0396e£14a3  0xa£60blaffb3e53a6  0x7ece213balec8811 

2  0x8e£dlff4d7el941b  0xb743££3460f62e9f  0xal507842e9da2f5d  0x628c35d2c5c7724f 

3  0x623b009el0cc32fe  0x93d3202a8536b686  0xcecfe9fl54120adb  0x59969f88b7901893 

4  0xe3030e94754a8c27  0x2893a81ecb8c6aa7  0x249eb96507002c4f  0xb28a90711088ba68 

5  0xe895ela£53cae98e  0x4al9el22b28447be  0x9d92eaad875e312£  0x7227913afc65bb4a 

J  =  6 

n  K=0  K  =  1  K=2  K  =  3 

1  0xad7f6af0771f42d0  0xa6735 17c3e97a375  0xeae617e06dcdfcd2  0xl6a04a9eee343931 

2  0x7152e34£71bl3bdf  0xbf 549048b46c344b  0xl32d592b77eecfaa  0x7al77d8c7427d713 

3  0xc91be73b58418123  0xad0064cc27aab021  0x£d91c38a7805f8d7  0x82£297163077e680 

4  0x4e55e470£36f030c  0x500ea6ba£0d3b074  0x6ea259bd74ef380c  0x5bl£1259957dc675 

5  0x085942e71dc8af6b  0xfb5£a80c2bb01aaf  0x92679c69aa£343ff  0xb74e8d82a30ec277 

J  =  7 

n  K=0  K  =  1  K=2  K  =  3 

1  0xfbcl292£c680a416  0xe45087ee40f 56993  0xe£6a5b85ba23a884  0xc0e725b06d4e72fb 

2  0x6d9blaf20461b9bc  0x2c4566e9fde33783  0x38e0e42505739ea7  0x90b024dc86178651 

3  0x631011d91b£6b2e9  0x75987bb9443e5261  0x6£e6183c£784ebd0  0xed8b48c5ce0a240f 

4  0xff48c5924e2590a4  0x6d37£cce4dad574c  0x916a0c91c447963a  0x9515183807fc0£59 

5  0x6e62cl7ab7b53e40  0x92f288c5f7ec56a6  0x4fll9dlea6a778ea  0xc8700e32f6366368 
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Jump  demonstration  table,  real: 


Vtab(0xd8940e83ec602c7b,  u01):  jump  =  J*2A172  +  K*2A86 
J  =  0 

n  K=0  K  =  1  K=2  K  =  3 

1  0.7555911943063999  0.5257697064545485  0.2310319741371908  0.7070599495438145 

2  0.2562229050495726  0.6512785234908502  0.2167231208215867  0.3188684950547228 

3  0.6778125688267099  0.9986420129339751  0.1114951334799346  0.3911038468987740 

4  0.6944085269058945  0.2944371979972840  0.3008030457550352  0.7801192886468541 

5  0.8803579211789434  0.1481568659924269  0.9810224305384454  0.7630148428828806 

J  =  1 

n  K=0  K  =  1  K=2  K  =  3 

1  0.2985214754497660  0.5533213545138587  0.3733650787810965  0.7271778510604474 

2  0.2173614368162538  0.5800632907283859  0.1532596136936074  0.1098929383082257 

3  0.7263769445327932  0.9307038764414088  0.2630979308462351  0.3335292243140925 

4  0.3481903305117164  0.2267079304352431  0.8490389123371728  0.5392174642971602 

5  0.7463993492610801  0.9110735467441307  0.5632729568178561  0.3206173145497258 

J  =  2 

n  K=0  K  =  1  K=2  K  =  3 

1  0.4427650515811010  0.5914546861359178  0.5521103190044319  0.3091089444858068 

2  0.8686519557453064  0.7583257125438025  0.8533557852483856  0.3914039243818549 

3  0.1563428437123182  0.3640693252312788  0.6342647338371971  0.5238384394898848 

4  0.6482885286303744  0.1720547437686925  0.3808033389892215  0.4007749431199457 

5  0.6397645576683257  0.1333271349198305  0.1869680363210801  0.2208377351943116 

J  =  3 

n  K=0  K  =  1  K=2  K  =  3 

1  0.9978300359681934  0.5253459580347650  0.6520434364333648  0.6159024180887417 

2  0.7362007598126519  0.6147738265072652  0.3458691068685823  0.7802136705105448 

3  0.1792665691972314  0.1222897195688772  0.8401856953873755  0.4626120757175254 

4  0.9976679115116637  0.6914913258609442  0.0826153155550903  0.9618768282008134 

5  0.5677754619209018  0.6548310487828527  0.9569026110636841  0.8739543062290190 

J  =  4 

n  K=0  K  =  1  K=2  K  =  3 

1  0.8514670054420896  0.0056707647037416  0.7493998407895588  0.3463311054458345 

2  0.5267263347498869  0.3588759601578639  0.3048360663576928  0.5147900427173518 

3  0.9066455051530885  0.6588028963715689  0.3138276160556613  0.3858707360408210 

4  0.0811939121889528  0.8576597758170588  0.3365798017919254  0.6914505299662046 

5  0.3221491658567590  0.7343208133332034  0.3059111728404138  0.6182186114554359 

J  =  5 

n  K=0  K  =  1  K=2  K  =  3 

1  0.0471899299473179  0.3986013540042530  0.6850691847459924  0.4953327913844295 

2  0.5585498783838203  0.7158813002469847  0.6301341212618828  0.3849519385892373 

3  0.3837128053572065  0.5774402717408644  0.8078600134068169  0.3499545773298113 

4  0.8867653953978253  0.1585030627425553  0.1430469390534205  0.6974268222843730 

5  0.9085370114369585  0.2894573888502388  0.6155230210928202  0.4459162491806613 

J  =  6 

n  K=0  K  =  1  K=2  K  =  3 

1  0.6777254902909791  0.6501971176463415  0.9175734446451357  0.0883833539976003 

2  0.4426710194545124  0.7473840882779974  0.0749107104419934  0.4769209354309674 

3  0.7855820197949068  0.6757872579980055  0.9905054295595831  0.5115141324208419 

4  0.3059981132277549  0.3127235609434382  0.4321647727528529  0.3559428662024322 

5  0.0326120199440846  0.9819283513749614  0.5718934782007183  0.7160423702180720 

J  =  7 

n  K=0  K  =  1  K=2  K  =  3 

1  0.9834161512030755  0.8918538052248174  0.9352166367990679  0.7535270267229011 

2  0.4281479683744526  0.1729339905995289  0.2221815672287007  0.5651877439869093 

3  0.3869639544536436  0.4593579604445435  0.4371047161908056  0.9279065592691400 

4  0.9972041589918688  0.4266355518026803  0.5680244308353208  0.5823531281275650 

5  0.4311943935969865  0.5740132792779360  0.3088625144797826  0.7829598307054879 
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9.1  T258  Class  Code 


typedef  uint64_t 

uz ; 

// 

class  rng  { 

// 

public: 

rng(uz  seed); 

rng&  init() ; 

rng&  t(); 

rng&  jump(uz 

p); 

rng&  jumpk(uz 

k, 

uz  p)  ; 

uz  gen() ; 

double  u01(); 

static  const 

int 

IS  =  86; 

static  const 

int 

JL  =  172; 

bool  operator !=(const  rng&  w) ; 
bool  operator== (const  rng&  w) ; 
private : 

uz  s[5] ;  // 

uz  seed;  // 

void  seedx(void);  // 

static  const  int  CPd[5] ;  // 

static  const  uz  JP [63] [5] ;  // 


rng::rng(uz  seed)  {  // 

this->seed  =  seed; 
initQ  ; 


rng&  rng: :init()  {  // 

seedx() ; 

tO; 

return  *this; 


void  rng::seedx()  {  // 

uz  c  =  0x27bb2ee687b0b0fd,  d  = 
do  s [0]  =  c*seed+d;  while(s[0] 
do  s[l]  =  c*s[0]+d;  while(s[l] 
do  s[2]  =  c*s[l]+d;  while(s[2] 
do  s[3]  =  c*s[2]+d;  while(s[3] 
do  s[4]  =  c*s[3]+d;  while(s[4] 

} 


64-bit  unsigned  integer 
T258 

//  constructor 
//  (re) initialize 
//  state  transition 
//  jump  2Ap 
//  jump  k*2Ap 
//  integer  generator 
//  u(0,l)  generator 
//  log_2  (  small  jump  ) 
//  log_2  (  large  jump  ) 
//  state  comparison 
//  state  comparison 

LFSR  state  array 
this. seed 
set  seed 
CP  degrees 
jump  polynomials 


constructor 


initialize 


set  LFSR  seeds 
0x891087b8e3b70cbl; 

<  0x000002) ; 

<  0x000200) ; 

<  0x001000) ; 

<  0x020000) ; 

<  0x800000) ; 


rng&  rng 

:t()  {  //  T258:  C 

s 

q 

k-s 

// 

k 

s  [0] 

=  C(s[0] 

&  -0x000002) 

« 

10) 

A 

(C(s[0] 

« 

i) 

A 

s[0]) 

» 

53); 

// 

63 

s[l] 

=  C(s[l] 

&  -0x000200) 

« 

5) 

A 

(((s[l] 

« 

24) 

A 

s[l]) 

» 

50); 

// 

55 

s  [2] 

=  C(s[2] 

&  -0x001000) 

« 

29) 

A 

(  C (s [2] 

« 

3) 

A 

s[2]) 

» 

23); 

// 

52 

s[3] 

=  C(s[3] 

&  -0x020000) 

« 

23) 

A 

(  C  (s  [  3  ] 

« 

5) 

A 

s  [  3] ) 

» 

24); 

// 

47 

S  [4] 

=  C(s[4] 

&  -0x800000) 

« 

8) 

A 

(C(s[4] 

« 

3) 

A 

s  [4] ) 

» 

33); 

// 

41 

return  *this; 

} 

rng&  rng::jump(uz  p)  {  //  jump  2Ap 

uz  i,  j,  a[5]  =  {  0  }; 
for(i=l;  i;  i«=l,  t()) 
for(j=0;  j<5;  j++) 

if(JP[p  %  CPd[j]][j]  &  i)  a[j]  A=  s[j]  ; 
memcpy(s,  a,  5*sizeo£(uz)) ; 
return  *this; 


rng&  rng: : jumpk(uz  k,  uz  p)  {  //  jump  k*2Ap 

for(;  k;  k»=l,  p++)  if(k  &  1)  jump(p); 
return  *this; 

} 


uz  rng::gen()  {  //  64-bit  integer 

t() ; 

return  s[0]  A  s[l]  A  s[2]  A  s [ 3]  A  s [4] ; 

} 


double  rng::u01()  {  //  U(0,1) 

uz  z ; 

const  static  double  d  =  1 . 0/(lUL«53)  ; 
do  z  =  gen()  »  11;  while (!z); 
return  d  *  z; 

} 
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bool  rng: : operator !=(const  rng  &w)  {  //  state  inequality 
uz  e  =  0 ; 

for(int  i=0;  i<5;  i++)  e  |=  s[i]  A  w.s[i]; 
return  e ; 

} 

bool  rng : : operator==(const  rng  &w)  {  //  state  equality 
return  !  (  *this  !=  w  ); 

} 

const  int  rng::CPd[5]  =  {  63,  55,  52,  47,  41  };  //  CP  degrees 


const  uz  rng : : JP[63] [5]  =  {  //  3P[p][i]  =  2Ap  jump  polynomial  for  LFSR[i] 

{0x0000000000000002,  0x00000000000002,  0x0000000000002,  0X000000000002,  0X00000000002}, 

{0x0000000000000004,  0x00000000000004,  0X0000000000004,  0X000000000004,  0X00000000004}, 

{0X0000000000000010,  0X00000000000010,  0X0000000000010,  0X000000000010,  0X00000000010}, 

{0X0000000000000100,  0X00000000000100,  0X0000000000100,  0X000000000100,  0X00000000100}, 

{0X0000000000010000,  0X00000000010000,  0X0000000010000,  0X000000010000,  0X00000010000}, 

{0X0000000100000000,  0X00000100000000,  0X0000100000000,  0X000100000000,  0X00100000000}, 

{0x0000008000004006,  0x20000200100200,  0x080805414d000 ,  0x02009262d390 ,  0x00004800000}, 

{0X0020000000018014,  0X20050881500001,  0x5209f6b064797 ,  0x54af 12d5c26c ,  0x00000004920}, 

{0X00001801C00C011 0,  0x02 80d09c 1000 19 ,  0x0c54ale62 1499 ,  0x3d96aeb24aa5 ,  0x00010410400}, 

{0X700000C007816105,  0xlc8104908c00cl,  0xb05ba8446e76e ,  0x3081fec2623c ,  0x00100148048}, 

{0x7e3fe03e0000e011 ,  0x002631e50851cf ,  0xbba8fed05eb47 ,  0x702b58a6b6el ,  0x11044801040}, 

{0x7ff81f40fbf06105,  0x0d8e23 ldl49309 ,  0x5c23bbbc5da56 ,  0x24083a6fe8b7 ,  0x08481485932}, 

{0x0030000000008012,  0x61b24de97c0da3 ,  0xed52c8defa3df ,  0x5065b92cd5bc ,  0xl60586101cc} , 

{0x00001e00e00f0104,  0x00aa994928aec9,  0xfb065085eac50 ,  0x77e8cl7a05f8 ,  0x0db004cf018} , 

{0x7e00004007f92013 ,  0x4calb5574dd946,  0x288336225032a,  0x0a0b2fb86260 ,  0xl3de3800108} , 

{0x7fe7fffefc006105 ,  0x23990533099a2d,  0x6e57aa7334925 ,  0x60057f 19fef8 ,  0x097edfbl7f2} , 

{0x0007f8ffc3fd6016 ,  0x275a2982e86cl3 ,  0xl263bf9acdbb6 ,  0x2a7d2426007b ,  0x061deee38fe} , 

{0x702fffbefc7e2115 ,  0x37e54836f269el ,  0x4bfe07bl95bb6 ,  0x017bbf224b71 ,  0x00e00adae2e} , 

{0x01f00681200dcll6,  0x57eb3124fac097 ,  0xl67a720daf 502 ,  0x731 Ib420e038 ,  0x0503e540566} , 

{0x0e27fefee387cll2 ,  0x04b5b746efc071 ,  0xc707bfd0el5a6,  0x5 16e04fd38cc ,  0xl5c8006ee5c} , 

{0x7fd7f8ffc40b6103 ,  0x08a943bd59e7bf ,  0xb0e2f0e8a5511 ,  0x50f342c7f5dd,  0x0d9bd741142} , 

{0x0fc7e63f238c2004 ,  0xl9067d8a48d883 ,  0x2862328837clc ,  0x70ead9760ba8 ,  0x07da4b6fcde} , 

{0x0ff800000077e012 ,  0xll72e358700271 ,  0x63d6501573b4d,  0x7775288793a0 ,  0xl4bb8e21866} , 

{0x01ffe03f03ffcl04 ,  0x30alf72072a741 ,  0xa4c006bb09430 ,  0x4f2d355c7f98 ,  0x0cd36f7b57c} , 

{0x0000000000012012,  0x5edc4fa65cd612 ,  0xe88c757d05b63 ,  0x0c8b487d8528 ,  0x07c2987ce2a} , 

{0x0000000104000104,  0x7f0672769e600c ,  0x234ecd09c6285 ,  0x38310a7ff929 ,  0x04ab8472 le4} , 

{0x0010008000014016,  0x4a271b24al5c9a,  0x5825 170abee2c ,  0x4860388bc30a ,  0x05c3736f458} , 

{0x0020060120028114,  0xl733ede9dldfa0,  0x2816247ecl6fl ,  0x46447123e58f ,  0xl59bd06cfd2} , 

{0x0e001881c0754116,  0x3a35cable5e287,  0xaa7dad2018eed,  0x0ca6a442abe4 ,  0x0d9f5e2bd96} , 

{0x71d81f01fb80elll ,  0x48dla4f8d0e015 ,  0x43db4dee649bb ,  0x2940aa4aea57 ,  0xl69e0af8cee} , 

{0x01c00700380cal04 ,  0x5adaaead5e39al ,  0x5e5d41686bl86,  0x2b537911 lbl8 ,  0x08e65aa5574} , 

{0x0fc7e07f078f0010 ,  0x680a74c2db72ab ,  0xele7e9e257df7 ,  0x70d5e911 lb6e ,  0xl656b4b58a2} , 

{0x01e00000000f8102 ,  0x526b6108440381,  0xl45917717f794 ,  0x2 1735e75c963 ,  0x09f5d9b72de} , 

{0x0007f87fc3fd0004,  0xl0d8b20cc2dl6d,  0xla079270e423e ,  0x598fll394622 ,  0x060ebc9bcf4} , 

{0x700fffbef87fa011 ,  0x6ce992764e3e06 ,  0xeed334eb23c6f ,  0x5f36f89389ba ,  0x01b442b62f8} , 

{0x01e01e80e000cl06,  0xl251a9fa5ac371 ,  0xle0d55467df96 ,  0x06790d966652 ,  0x040a2ccld52} , 

{0x7e27f83fc404a017 ,  0x75bclfdl9772e4 ,  0x095ae04d250fb,  0x788a4bb5b473 ,  0x0180063026c} , 

{0x0fe81840c473cll4 ,  0x0ee9c38438b036 ,  0x813bec99acd9d,  0x7f 5d9d9bfe68 ,  0x000a004140a} , 

{0x71e7e63f247d8113 ,  0x58b2accldf8c9a,  0xl9ef238355322 ,  0x339c5a67480c ,  0x01000220044}, 

{0x70 17e7be387ae 107 ,  0xlafecbeb09a982 ,  0x91dla9b679f82 ,  0x396e0fc828a2 ,  0x00080001002}, 

{0x71e01901df822011 ,  0x263665739af91f ,  0xf6a6f3cea50b0 ,  0x2 Id74a9656e8 ,  0x00000200004}, 

{0x0fd01881c07a6104 ,  0x75394e09c4b2ca,  0x5cb6b8cle7ff4 ,  0x6188995f3157} , 

{0x71dff97edc73e015 ,  0x0a7d90ee25a348,  0x61688699db586 ,  0x3fb7dd50a418} , 

{0x0fdff8ffc78a2116,  0x5286ab7e83ae62 ,  0xda9574e678f90 ,  0x06ce5db5ad5c} , 

{0x71c007813b8cell5 ,  0x4d3bcdf8d85adc ,  0x5c23f978ed0f3 ,  0xle2c621f971a} , 

{0x71e000c0078ecll7,  0x017dde9b20b0d5 ,  0x6cf46e5a6f0f7 ,  0x7a505dc81443} , 

{0x7e381841c7fcell5 ,  0x027f464d66ea0c ,  0xac0db23cd9139 ,  0x2613f7dal7b5} , 

{0x0fe7e0ff038ecll4 ,  0xl3c4979887ba9d,  0x5db01e5be4eee} , 

{0x01d01801c0030112 ,  0x4cb8bb21bf4304,  0xb6f0039dacb41} , 

{0x7007e6bf24726101 ,  0xl6b8f3440cf01a,  0xe32af56d4470e} , 

{0x70301fclff876003,  0x4e3925 14dce977 ,  0xd2ae9a043e26a} , 

{0x01fffebfe7f08000,  0x005c0f0080a00a,  0xc2b205c5ebele} , 

{0x7e30004007f98001 ,  0x29112080522885}, 

{0x7fe7elfel80f6001 ,  0x3884184171249c} , 

{0x7el7f8bfc4044005 ,  0x0400c0080900e0} , 

{0x0fc80640207d4010}, 

{0x0ff7fe7ee388al00} , 

{0x7ff01e80e3f9e007} , 

{0x01d801cllc06a016} , 

{0x01ffe73e3bf46112} , 

{0x0fc000c0047f6100} , 

{0x01c7fffffff0e000} , 

{0x7ff81fc0fff02001} 

}; 
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9.2  Driver  Code 


#include 

#include 

#include 

#include 

#include 

#include 

#include 

#include 


<iostream> 

<iomanip> 

<sstream> 

<cstdlib> 

<cstring> 

<unistd.h> 

<stdint.h> 

"t258.h" 


using  namespace  std; 

#define  Px(x)  "0x"  «  setw(16)  «  setfill(’0’)  «  hex  «  (x) 

#define  Pu(x)  setw(18)  «  setprecision(16)  «  setfillC’  ’)  «  fixed  «  dec  «  (x) 

void  Vtab(uz  seed,  bool  qz)  { 

int  NL  =  8 ,  NS  =  4,  n  =  5 ;  //  #  large,  small,  sample 

rng  *z[NL][NS];  //  stream  and  substream  rngs 

for (int  i=0;  i<NL;  i++)  //  large  jumps 

for(int  j  =0 ;  j<NS;  j++)  {  //  small  jumps 
z[i][j]  =  new  rng(seed); 

z [i ] [j]->init() . jumpk(i,  rng : : 1L) . jumpkfj ,  rng::lS); 


cout  «  "Vtab("  «  Px(seed)  «  ",  "  «  (qz  ?  "uz":  "u01")  «  ") :  " 
«  "jump  =  J*2A"  «  dec  «  rng::JL  «  "  +  K*2A"  «  rng::lS 
«  endl  «  endl ; 


for (int  i=0;  i<NL;  i++)  {  //  large  jumps 

cout  «  setw(22)  «  setfill  (’  ’)  «  "J  =  "  «  setw(3)  «  i  «  endl; 

cout  «  "  n  " ; 
for (int  j  =0 ;  j<NS;  j++) 

cout  «  setw(16)  «  "K  =  ”  «  setw(3)  «  j ; 
cout  «  endl ; 


for (int  k=l;  k<=n;  k++)  {  //  samples 

cout  «  setw(3)  «  dec  «  setfillC  ’)  «  k  «  " 
for (int  j=0;  j <NS ;  j++)  //  small  jumps 
if (qz) 

cout  «  "  "  «  Px(z[i] [j]->gen()) ;  //  integer 

else 

cout  «  "  "  «  Pu(z[i] [j]->u01()) ;  //  U(0,1) 
cout  «  endl ; 

} 

cout  «  endl ; 

} 

} 

void  Vjump(uz  seed,  uz  pmax)  {  //  inductive  jump  verification 

rng  z0(seed),  zl(seed);  //  T258  objects,  same  state 

cout  «  "Vjump("  «  Px(seed)  «  ",  "  «  dec  «  pmax  «  "): 

if(  z0.jump(0)  !=  zl.tO  )  {  //  base 
cerr  «  "fail  base  check"  «  endl; 
exit(l) ; 

} 

for(uz  p=l;  p<=pmax;  p++  )  //  induction 

if(  z0 .  initO  .  jump(p)  !=  zl . init() . jump(p-l) . jump(p-l)  )  { 
cerr  «  "fail  induction  check,  p="  «  dec  «  p  «  endl; 
exit(l) ; 

} 

cout  «  "for  all  p  =  0:"  «  pmax  «  "  jump(p)  ==  tA(2Ap)"  «  endl; 


int  main  (int  argc  ,  char*  argv[])  { 

uz  seed  =  0xd8940e83ec602c7b;  //  rng  seed  default 
uz  pmax  =  172  +  64; 

Vtab(seed,  true); 

Vtab(seed,  false); 

Vjump(seed,  pmax); 

} 
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10.  Conclusions  and  Recommendations 


The  T258  RNG  passes  tests  for  randomness  and  provides  high-resolution  53-bit 
random  real  numbers. 

T258  class  RNG  objects  are  intrinsically  thread-safe  in  the  sense  that  instances  do 
not  interact  and  are  thus  computationally  independent.  But  this  is  not  sufficient  for 
statistical  independence. 

The  enumeration  of  threads  and  initialization  of  thread  RNG  streams  at  offsets  of 
2172  ensures  that  the  286  available  threads  are  statistically  independent  and  that  com¬ 
putation  are  easily  reproducible.  Within  threads,  increments  of  286  provide  statisti¬ 
cal  independence  of  286  stochastic  quantities  with  stream  length  286.  Verification  of 
the  jump  computations  guarantees  these  independence  properties. 

When  C++ 11  compilers  become  widely  available,  a  T258  engine  can  be  incorpo¬ 
rated  into  the  C++1 1  <random>  library  framework.  Care  must  be  taken  to  preserve 
the  independence  properties.  Otherwise,  as  noted  in  Section  1,  no  sets  of  quantities 
within  or  among  threads  can  be  claimed  to  be  independent  random  samples. 
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List  of  Symbols,  Abbreviations,  and  Acronyms 


CDF 

cumulative  distribution  function 

CP 

characteristic  polynomial 

LCG 

linear  congruential  generator 

LFSR 

linear  feedback  shift  register 

QT 

QuickTaus 

RNG 

random  number  generator 

STL 

Standard  Template  Library 
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